Back

Cancer Medicine

Wiley

Preprints posted in the last 7 days, ranked by how well they match Cancer Medicine's content profile, based on 24 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.

1
Prevalence and Clinical Significance of Adult-Onset Cancer Predisposition Variants in Pediatric Oncology

Maciaszek, J. L.; Pastor Loyola, V.; Cain, T.; Cardenas, M.; Blackburn, P. R.; Wilkinson, M. R.; Koo, S. C.; Wu, C.-H.; Li, C.; Wang, L.; Nichols, K. E.; Klco, J. M.; Eldomery, M. K.

2026-06-08 genetic and genomic medicine 10.64898/2026.06.07.26354365 medRxiv
Top 0.2%
4.3%
Show abstract

Purpose: Pathogenic or likely pathogenic (P/LP) variants are increasingly identified in genes more commonly associated with adult-onset cancer predisposition, but their prevalence and relevance to children who present with cancer remain unclear. Methods: We retrospectively analyzed 1,280 consecutive pediatric patients with cancer who underwent clinical germline sequencing, using a virtual panel, from 2021 to 2024. Genes with P/LP variants were categorized as aoCPG or pediatric-onset cancer predisposition genes (poCPG) according to cancer risk before age 18 years and pediatric surveillance recommendations. Variant relevance was adjudicated using tumor diagnosis/histopathology, immunohistochemistry, and tumor molecular features and classified as primary, secondary, or indeterminate. Results: Among 1,280 patients, 197 (15.4%) harbored 211 P/LP variants across 54 genes. Sixty-six variants (31.3%) occurred in aoCPG, 87 (41.2%) in poCPG, and 58 (27.5%) were heterozygous variants in autosomal recessive genes. Among adult-onset variants, 7 (10.6%) were primary, 54 (81.8%) secondary, and 5 (7.6%) indeterminate. Among pediatric-onset variants, 77 (88.5%) were primary and 10 (11.5%) secondary. Six patients (3 adult-onset variants; 3 pediatric-onset variants) received targeted therapy informed by germline/somatic sequencing results. Conclusion: In pediatric oncology, most variants in aoCPG are secondary rather than tumor-related findings. Tumor-informed interpretation, beyond variant classification, may improve reporting, counseling, and therapeutic decision-making

2
Integrated T-Cell Receptor Repertoire and Tumor Immunogenicity Profiling Reveals Distinct Immunogenomic States in Endometrial Cancer

Aversa, I.; Abatino, A.; Isabello, A.; Gallo, R.; Isdraele, L.; Straface, T.; Zullo, F. M.; Guida, M.; Saccone, G.; Fiume, G.; Venturella, R.; Viglietto, G.; Cuda, G.; Costanzo, F.; Zullo, F.; Palmieri, C.

2026-06-10 oncology 10.64898/2026.06.08.26355191 medRxiv
Top 0.5%
2.3%
Show abstract

Background Endometrial cancer exhibits marked molecular and immune heterogeneity that is only partially explained by established genomic biomarkers. We investigated whether T cell receptor (TCR) repertoire architecture captures complementary dimensions of antitumor immunity beyond conventional molecular classification. Methods Paired tumor and peripheral blood samples from eight patients with molecularly characterized endometrial cancer underwent TCR repertoire profiling. Diversity, clonality, and tumor blood overlap metrics were integrated with genomic variables, including tumor mutational burden (TMB), genomic instability metric (GIM), and POLE status. Principal component analysis and correlation analyses were used to identify major dimensions of repertoire organization. Composite Immune Focusing and Immune Sharing Scores were derived to summarize dominant repertoire patterns. Results The first two principal components explained 70.1% of total repertoire variance and revealed substantial heterogeneity independent of histological subtype. TMB was strongly associated with reduced repertoire diversity and increased clonal dominance, resulting in a robust association with the Immune Focusing Score ({rho} = 0.88, p = 0.004). POLE mutated tumors occupied the extreme end of this focusing continuum. In contrast, genomic instability was associated with increased tumor blood repertoire overlap and preserved diversity, reflected by a strong correlation between GIM and the Immune Sharing Score ({rho} = 0.76, p = 0.027). The two immune scores showed minimal correlation with each other ({rho} = -0.24, p = 0.57), indicating that they capture largely independent aspects of immune organization. Conclusion Integrative analysis of TCR repertoire architecture and tumor genomics identifies distinct immunogenomic states in endometrial cancer that are not fully captured by conventional molecular classification. If validated in larger cohorts, immune focusing and immune sharing metrics may provide complementary biomarkers for patient stratification and immunotherapy-oriented precision oncology

3
Calibrating trust in AI-assisted pituitary surgery

Hudson, G. R.; Khan, D. Z.; Fayez, F.; Bhatia, S.; Bano, S.; Costanza, E.; Blandford, A.; Stoyanov, D.; McCulloch, P.; Marcus, H. J.; University College London Collaborators,

2026-06-04 surgery 10.64898/2026.06.02.26354735 medRxiv
Top 0.7%
1.8%
Show abstract

Background: Endoscopic endonasal transsphenoidal surgery (EETS) requires navigation around neurocritical anatomy. Today, artificial intelligence clinical decision support systems (AI-CDSSs) can orientate surgeons, but clinician trust in AI remains unclear, limiting safe deployment. This study evaluates how modifiable design affects trust and performance in a real-world pituitary surgery AI-CDSS. Method: Online, 70 clinicians with pituitary surgery experience were randomised evenly to a Basic or Enhanced AI-CDSS which outline the sella on EETS operative video. The Enhanced group additionally received explanation of the model and previous publications, alongside confidence labels depicting outline reliability. Both groups annotated the sella on six video clips, first alone then with the optional AI-CDSS. Clips were ordered by declining AI performance, except for the final clip. Self-reported trust was measured using a 1-7 scale after each annotation, and performance was the DICE overlap between user annotations and the ground truth. Comparisons used Mann-Whitney U and permutation analysis. Results: Sixty-four participants (91%) finished the exercise (31 Basic, 33 Enhanced). When AI performed best, median trust was 5.00 in both arms (U=559, p=.521). However, when AI performed worst, trust was significantly lower for the Enhanced group (3.00 vs 3.67, U=668, p=.035), sustained in the final clip (3.67 vs 4.33 U=687, p=.019). User performance improved with the AI-CDSS, but with no significant difference between the groups on the best or worst AI performing clips. Nevertheless, for the best AI, senior clinicians had higher median performance in the Enhanced group (0.95 vs 0.90, U=75, p=.066). There was also less dispersion in the Enhanced group when AI was inaccurate (IQR: 0.07 vs 0.21, p=.004). Conclusion: Interface design can improve trust calibration in a surgical AI-CDSS and may increment performance in seniors when AI is accurate, and consistency when AI is inaccurate. In future, these features may form important safety checks during translation to the operating room.

4
Using colorectal cancer screening evidence to stratify for personal risk among those with a family history of colorectal cancer: a 42-year cohort study

King, D. W.; King, P. E.; Blanchard, M. W.; Ning, N. W.; King, S. K.; Grimm, M. C.; Ha, T.; Eagar, K.

2026-06-08 health systems and quality improvement 10.64898/2026.06.04.26354891 medRxiv
Top 0.7%
1.7%
Show abstract

Objective To determine if it is possible to assess individual patient risk of the development of colorectal cancer (CRC) in people in high-risk groups due to their family history. Design/Method Retrospective observational study of prospectively collected data from consecutive patients referred for a colonoscopy. 2,478 consecutive patients were referred to a single colorectal surgical practice in Sydney, Australia between 1977 and 2018 for a colonoscopy because of a family history of CRC. Of these, 1,963 have been followed for more than 10 years and are the subject of this paper. Histopathological findings categorised as normal (N), non-advanced adenoma (NAA) or advanced neoplasia (AN) with AN proven to be the precursor to CRC. Intervention Colonoscopic screening on the basis of contemporary practice to 2006 and subsequently according to Australian National Health and Medical Research Council guidelines. Results Participants with normal or low-risk findings in the first decade remain at lower risk of CRC for 30 years from the commencement of screening. Conclusion It is possible to stratify individual patients in a high relative risk cohort into those with high or low personal risk of CRC based on colonoscopic findings in the first 10 years of surveillance. Those with no AN in the first ten years have a lower 30-year risk of developing AN than the general community. This offers the possibility of structuring surveillance programs around individual risk rather than group risk, lessening the need for multiple surveillance colonoscopies in the majority of such patients and improving the cost effectiveness of CRC screening at the population level.

5
A Comparison of Manual and Automated Approaches to Developing Computable Algorithms for Identifying Acute Pancreatitis

Bann, M. A.; Carrell, D. S.; Gruber, S.; Heagerty, P. J.; Williamson, B. D.; Nelson, J. C.; Hazlehurst, B.; Felcher, A.; Nyongesa, D. B.; Slaughter, M. T.; Sapp, D. S.; Cronkite, D. J.; Ball, R.; Floyd, J. S.

2026-06-08 health informatics 10.64898/2026.06.05.26354934 medRxiv
Top 0.8%
1.5%
Show abstract

Objective: Clinical phenotyping methods that rely on clinical and informatics expertise can be time-intensive and costly. We tested both manual and highly automated approaches using electronic health record (EHR) data to identify an FDA Sentinel Initiative health outcome of interest, acute pancreatitis. Materials and Methods: We trained and evaluated machine learning algorithms using EHR data with two approaches: a custom approach that included manually curated features and trained on outcomes data validated with medical record review, and a highly automated approach that greatly simplifies and automates feature engineering and relies on low-cost silver-standard outcomes for model training. Results: Custom algorithms using manually curated structured claims data discriminated cases from non-cases with a high degree of accuracy (cv-AUC 0.89 [95%CI 0.84-0.94]); the inclusion of natural language processing (NLP)-derived covariates from clinical notes increased performance slightly (cv-AUC 0.91[95%CI 0.86-0.97]). The automated algorithm trained on the outcome count of diagnosis codes performed less well (AUC 0.80 [95% CI 0.75-0.85]) but improved using maximum lipase value as an outcome (AUC 0.88 [95% CI 0.84-0.92]). At a positive predictive value of 90%, the custom algorithm had a sensitivity of 92%, the automated algorithm trained on diagnosis code count had a sensitivity of 45%, and the automated algorithm trained on maximum lipase value had a sensitivity of 84%. However, a prediction rule derived by clinicians during chart review was nearly as accurate (maximum lipase value [≥] 3 times upper limit of normal; AUC 0.86, PPV 85%, sensitivity 92%). Discussion: Machine learning algorithms with manually curated structured data and NLP features trained on validated outcomes data successfully identified validated events. Use of an outcome in the automated model based on specific phenotype knowledge (maximum lipase value) allowed for performance similar to the custom model and with considerably less resources.

6
Formalising Limits of Circulating Tumour DNA Detection: A Signal Detection Framework for Clinical Threshold Specification

Walinjkar, A.

2026-06-10 oncology 10.64898/2026.06.08.26355204 medRxiv
Top 0.9%
1.5%
Show abstract

Background: Circulating tumour DNA (ctDNA) liquid biopsy is now established across oncology for early cancer detection, minimal residual disease surveillance, and treatment monitoring. Detection thresholds for all current ctDNA assays are derived empirically through receiver operating characteristic analysis on training cohorts - a statistically valid but theoretically uninformed approach that does not specify the minimum detectable tumour fraction given assay technical characteristics, nor identify when increasing sequencing depth ceases to provide additional clinical information. Methods: We model ctDNA detection as a binary hypothesis testing problem with Binomial-distributed mutant allele counts against a sequencing error noise floor. The Neyman-Pearson lemma is applied to derive the uniformly most powerful detector and the minimum detectable tumour fraction in closed form. The sequencing assay is modelled as a binary symmetric channel and Shannon channel capacity is calculated. Empirical validation uses n=61 data points extracted from five published peer-reviewed analytical validation studies across five independent institutions in the US and EU (2018 - 2025): Yu et al. 2022, Stetson et al. 2018, Frydendahl et al. 2023, Northcott et al. 2024, and Cheng et al. 2025. Results: The minimum detectable tumour fraction is derived in closed form as f_min approximately equal to (z_alpha + z_beta) multiplied by the square root of (epsilon divided by N), where N is sequencing depth, epsilon is the platform error rate, and z_alpha, z_beta are standard normal quantiles at the specified false positive and false negative rates. Shannon channel capacity is C = 1 minus H(epsilon) bits per read, where H(epsilon) is binary entropy. Empirical validation yields 84.3% agreement for single-locus assays. Discordance for multi-locus tumour-informed assays (NeXT Personal, duplex WGS) is consistent with the single-locus model scope and identifies the principal theoretical extension required. Conclusions: This framework provides the first formal Neyman-Pearson optimality proof for ctDNA detection, a closed-form detection limit, and a platform-independent efficiency metric for NHS and regulatory standardisation. Keywords: circulating tumour DNA; liquid biopsy; Neyman-Pearson detection; Shannon channel capacity; sequencing depth; limit of detection; minimal residual disease; signal detection theory

7
Quantifying Cancer Clinical Trial Eligibility Using Artificial Intelligence-Based Matching

Goel, K. P.; Myall, N. J.; Dickerson, J.; Caswell-Jin, J. L.; Johnson, T.; Worth, J. E.; Gensheimer, M. F.

2026-06-05 oncology 10.64898/2026.06.03.26354859 medRxiv
Top 0.9%
1.4%
Show abstract

PURPOSE: To develop and validate an artificial intelligence-enabled platform that converts unstructured cancer trial eligibility criteria into structured queries and quantifies trial eligibility across advanced/metastatic cancer trials. METHODS: We downloaded actively recruiting US interventional treatment trials for advanced/metastatic breast cancer, colon cancer, and non-small cell lung cancer from ClinicalTrials.gov. Medical oncologists created 24 synthetic patient vignettes. A large language model converted trial eligibility criteria into Structured Query Language (SQL) code and patient information into structured records, enabling automated matching. Cancer details and treatment history were considered, but not laboratory results or comorbidities. Validation included physician editing of generated eligibility code for 30 trials, and blinded physician eligibility assessment for five trials. We then evaluated how age, ECOG performance status, sex, and ZIP code affected the number of eligible trials. RESULTS: Of 833 candidate trials, 746 met inclusion criteria. In physician review of 30 trials, edits to generated SQL did not change any of 720 trial-patient eligibility determinations for 24 synthetic patients. In blinded validation across 120 trial-patient pairs, automated matching achieved 97% accuracy. Across synthetic patients, eligible trials ranged from 31 to 258 when there were no geographic restrictions. Eligibility decreased markedly with worse performance status and with geographic restriction (both p<0.001). Later-phase, randomized, and molecularly selective trials had fewer eligible patients. CONCLUSION: AI-based structuring of trial eligibility criteria can support accurate, scalable measurement of potential cancer trial eligibility. In this demonstration, performance status, geography, and age were major determinants of eligibility across the active metastatic trial landscape.

8
Documented clinical genetic testing among carriers of hereditary breast and ovarian cancer variants: Ancestry and socioeconomic disparities in the All of Us research program

Yerukala Sathipati, S.; Scott, H.

2026-06-10 oncology 10.64898/2026.06.09.26355262 medRxiv
Top 1.0%
1.3%
Show abstract

Importance: Hereditary breast and ovarian cancer (HBOC) variant carriers benefit from risk-reducing interventions, but only if identified. The extent to which carriers are clinically recognized, and whether recognition is equitable across diverse populations, is poorly characterized in a single large U.S. cohort. Objective: To estimate P/LP HBOC carrier prevalence across genetic ancestry groups, quantify documented clinical genetic testing among carriers, and evaluate ancestry and socioeconomic disparities in testing. Design, Setting, and Participants: Cross-sectional analysis of the All of Us Research Program Controlled Tier (Curated Data Repository v8/C2024Q3R9), comprising participants with short-read whole genome sequencing and linked electronic health record (EHR) and survey data. Carriers were ascertained from research genomic data independent of clinical testing. Exposures: Genetically inferred ancestry (African [AFR], Admixed American [AMR], East Asian [EAS], European [EUR], Middle Eastern [MID], South Asian [SAS]); self-reported household income and educational attainment. Main Outcomes and Measures: (1) Carrier prevalence with Wilson 95% CIs; (2) documented clinical genetic testing (procedure codes) among carriers; (3) adjusted odds of documented testing among women, by ancestry, before and after socioeconomic adjustment, using multivariable logistic regression. Results: Among 414,830 participants, P/LP HBOC carrier prevalence was 1.42% (95% CI, 1.38-1.45) overall and similar across ancestry groups (AFR 1.24%, AMR 1.32%, EAS 1.19%, EUR 1.52%, MID 1.68%, SAS 1.33%; overlapping CIs). Among 250,071 women in the testing analysis, documented clinical genetic testing was rare: only 74 of 5,878 carriers overall (1.3%) and 59 of 3,572 European-ancestry carriers (1.7%) had a documented test, with counts below reportable thresholds in all other ancestry groups. African-ancestry women had lower adjusted odds of documented testing than European-ancestry women (Model 1 adjusted odds ratio [aOR], 0.32; 95% CI, 0.27-0.39), an association that attenuated but persisted after adjustment for income and education (Model 2 aOR, 0.48; 95% CI, 0.40-0.58; P < 0.001); Admixed American women also had reduced adjusted odds (aOR, 0.71; 95% CI, 0.61-0.84). Lower income and lower education were independently and dose-dependently associated with lower testing odds (income <$25,000 aOR, 0.46; high-school education aOR, 0.54). Conclusions and Relevance: High-risk HBOC variant carriers are present across all ancestry groups at similar frequencies, yet documented clinical genetic testing was disparate in the different ancestry groups. African-ancestry women experience a testing gap that is not fully explained by socioeconomic position, implicating structural barriers in access and referral. Population-level strategies that decouple carrier identification from current referral pathways may be required to close this gap.

9
A liquid biopsy-centered, pan-cancer, open next generation sequencing panel to support clinical decision-making (LION panel)

Feierabend, S.; Künstner, A.; Forster, M.; Helbing, T.; Gebauer, N.; Gemoll, T.; Axt, F.; Nimmagadda, S. C.; Ranganathan, L.; Schwandt, J.; Heber, M.; Szymczak, S.; Hohensee, I.; Fliedner, S. M. J.; Scherer, F.; Oberländer, M.; Derer-Petersen, S.; Busch, H.; von Bubnoff, N.; Dazert, E.

2026-06-08 oncology 10.64898/2026.06.05.26354976 medRxiv
Top 1%
1.2%
Show abstract

Cancer treatment has shifted toward personalized therapy based on molecular profiling, particularly in advanced disease. Existing circulating tumor DNA panels are often broad, generating many non-actionable variants and incurring costs that limit routine use in molecular tumor boards. We developed and validated a manufacturer-independent, 109-gene liquid biopsy-centered pan-cancer open next generation sequencing panel (LION panel), combined with an in-house bioinformatic pipeline to support clinical decision-making. A total of 87 samples were analyzed, including 17 reference samples, 21 healthy blood donor controls, and 49 patient samples including nine tumor entities. The LION panel achieved 92% sensitivity and 99% specificity in reference samples, with high concordance to digital droplet PCR (r = 0.99). It detected variant allele frequencies as low as 0.05% (tumor-informed) and 0.5% (tumor-uninformed). Clinical concordance reached 82% with blood-based digital droplet PCR and 75% with whole exome tissue sequencing. In representative cases, variant dynamics correlated with disease progression and revealed additional targetable variants. Overall, the LION panel supports clinical decision-making by enabling identification of targetable variants, disease monitoring, and detection of treatment resistance, particularly when tumor tissue is unavailable.

10
Precision Imaging to Evaluate Kaposi Sarcoma (PRIME-KS): protocol for a multicountry novel artificial intelligence-based imaging device

Odeny, T. A.; Adhiambo, H. F.; Mangale, D.; Makanga, P. K.; Odeny, B.; Okuku, F.; Zhou, C.; Geng, E.; Carson, J.; Mudhune, V.; Bukusi, E.; Semeere, A.

2026-06-04 oncology 10.64898/2026.06.03.26354815 medRxiv
Top 1%
1.0%
Show abstract

Abstract Background: Kaposi sarcoma (KS) is the most common cancer among men in several Eastern African countries, yet treatment monitoring relies on imprecise, time-consuming ruler-based measurements defined by the AIDS Clinical Trial Group (ACTG). This method suffers from inter-observer variability, fails to capture lesion height or true geometric area, and performs poorly on dark skin. SkinScan3D (SS3D) is a portable, low-cost, AI-enabled 3D imaging device that provides objective measurements of KS skin lesion area, height, volume, and color. The Precision Imaging to Evaluate Kaposi Sarcoma (PRIME-KS) study evaluates whether SS3D provides more reproducible and accurate lesion measurements than the standard method, and validates its integration into routine clinical workflows in Kenya and Uganda. Methods: PRIME-KS is a multicountry prospective mixed-methods study with two clinical objectives. Objective 1 is a cross-sectional diagnostic accuracy study comparing SS3D with ruler-based measurement in 50 adults with KS (150 lesions) across sites in Kenya and Uganda. Two clinicians independently measure three lesions per participant using both methods. The primary outcomes are concordance correlation coefficient (CCC) for inter-rater reproducibility, and co-efficient of determination for accuracy. Objective 2 is a non-randomized before-and-after pilot study in 100 patients at three sites, evaluating device usability, acceptability, appropriateness, and feasibility using validated instruments, along with time-and-motion studies and activity-based micro-costing. Prior to these clinical objectives, a formative study used focus group discussions, discrete choice experiments, and human-centered design workshops to refine the SS3D device and protocols with end-user input. Discussion: PRIME-KS will provide the first rigorous evaluation of a 3D imaging device for monitoring KS treatment response in routine clinical settings. If SS3D demonstrates superior reproducibility and clinical utility, it could reduce unnecessary chemotherapy exposure and associated toxicities by enabling earlier, more objective assessment of treatment response. Trial registration: ClinicalTrials.gov NCT06898203, registered 27 March 2025. Pan African Clinical Trials Registry PACTR202603523439856. Keywords Kaposi sarcoma, SkinScan3D, 3D imaging, treatment monitoring, diagnostic accuracy, implementation science, usability, human-centered design, Kenya, Uganda

11
Epidemiology of Cervical Precancerous Lesions: Prevalence and Predictors from Pap Smear Screening in Hawassa City Hospitals, Sidama Region, Ethiopia. Institutional-Based Cross-sectional Study

Fisshatsion, A. B.; Zewude, Y. A.; Nisro, A. M.; Abebe, R. F.

2026-06-10 public and global health 10.64898/2026.06.09.26355254 medRxiv
Top 1%
0.8%
Show abstract

Background: Cervical cancer is the fourth most common cancer in women worldwide and remains a major public health challenge. In Ethiopia, it is the second leading cause of cancer deaths, with around 8,000 new cases and 6,000 deaths each year. Region?specific data on the prevalence and predictors of precancerous lesions remain scarce, yet such information is vital for guiding targeted reproductive health strategies. This study therefore examined the prevalence and predictors of cervical precancerous lesions among women aged 21-60 years undergoing Pap smear screening in public hospitals in Hawassa City, Sidama Region. Methods: An institution-based cross-sectional study was conducted among 241 women attending Pap smear screening at public hospitals in Hawassa City from March to August 2025. Sociodemographic and clinical data were collected via interviews and medical records. Lesions were classified based on the standardized international framework for reporting cervical cytology results from Pap smears per the Bethesda system. Multivariable logistic regression identified predictors p<0.05). Result: Of 241 women screened (mean age 35.3 years), cervical epithelial abnormalities were detected in 52 (prevalence 21.6%). Atypical squamous cells of undetermined significance was the most common abnormality (16.6%). Multivariable analysis showed HIV infection was significantly associated with precancerous lesions (AOR = 3.7, 95% CI: 1.69-8.12, p<0.05), while hormonal contraceptive use was protective (AOR = 0.27, 95% CI: 0.11-0.67, p<0.05). Conclusion: These results underscore the urgent need to strengthen cervical cancer prevention through targeted screening and early intervention. Integrating routine HIV testing with Pap smear programs would be especially valuable. Health authorities should expand accessible screening for women aged 21-60, with particular attention to those living with HIV, to help reduce the burden of precancerous lesions.

12
Combining centralized and decentralized approaches to assess and ensure data quality in Eurocrine(R) via Microsoft Power BI and DataquieR

Musholt, T. J.; Clerici, T.; Bergenfelz, A.; Schmidt, C. O.; Struckmann, S.

2026-06-05 health informatics 10.64898/2026.06.04.26354884 medRxiv
Top 2%
0.6%
Show abstract

Background: Medical registries have gained importance in the evaluation of healthcare quality outcomes. In the absence of high-quality evidence, such as randomized controlled trials, studies based on registry data are essential for informing clinical guidelines. Methods for assessing data quality are rarely described in detail. To ensure the credibility of registry-based studies, registries must use all available technical and operational means to guarantee high data quality. Method: Eurocrine(R) is a pan-European endocrine surgical database and quality registry initially funded by the EU healthcare programme, which started in 2015 and now includes more than 200,000 interventions as of April 2025. To ensure high data quality, interactive and standardized reports are created via Microsoft Power BI, which are created both centrally and locally. In addition, comprehensive data quality analyses were performed via the R-based package dataquieR. Results: Although a multitude of technical measures (for example, input screen design and real-time plausibility checks during data entry) are in place, they are not sufficient to prevent human errors at data entry. Errors identified in the reports were corrected, and preventive measures were implemented. Overall, the data quality was assessed as very good in terms of completeness, accuracy, and consistency. Conclusion: It is very important to provide registry users with an efficient and smart tool to identify data issues, as they have the clinical information to correct them. Data quality reports generated with dataquieR represent an effective tool for registry administrators. Predesigned Microsoft Power BI reports enable participating Eurocrine(R) clinics to self-audit their data.

13
When Algorithms Prescribe: A Cross-Sectional Study of Quality, Misinformation, and Engagement in Statin-Related Content on TikTok

Gharibyan, I.; Ahner, E.; Shao, R.; Sharma, D.; Navarsartian Tazehkand, T.; Diep, J.; Assoumou, B.

2026-06-08 health informatics 10.64898/2026.06.04.26354962 medRxiv
Top 2%
0.5%
Show abstract

Background: Statins are key to preventing atherosclerotic cardiovascular disease and lowering low-density lipoprotein cholesterol and cardiovascular events. However, skepticism regarding their safety and value persists and is increasingly influenced by social media. TikTok has emerged as a major source of health information, but its content varies in quality and accuracy. This study evaluated the quality, attitudes, misinformation, and engagement of statin-related content on TikTok. Methods: Public TikTok videos were collected using predefined search terms and coded by creator type, thematic content, and overall attitude. Video quality was assessed using the DISCERN instrument, the Patient Education Materials Assessment Tool for Audiovisual Materials, and the Global Quality Score. False or misleading claims were independently reviewed by two cardiology fellows. Associations between engagement and quality were also examined. Results: Of 1,349 screened videos, 258 met inclusion criteria. Most were educational (91.0%), with non-physician healthcare providers (34.5%) as the largest creator group. Risks or negative effects were discussed more often than benefits (63.2% vs 42.2%), and 39.5% contained at least one false or misleading claim, most often from complementary and alternative medicine providers and wellness promoters. Quality differed by creator type across all instruments, with physician-created content scoring highest. Video popularity showed minimal association with informational quality. Conclusion: Statin-related TikTok content frequently emphasizes harms, often contains misinformation, and varies substantially in quality by creator type. Greater involvement of healthcare professionals on social media may help improve digital health literacy and counter misleading information about statin therapy.

14
Breast cancer polygenic risk score performance varies by socioeconomic status

Domian, H. I.; Tian, X.; Ong, D.; Hamilton, L.; Shieh, Y.; Musharoff, S. A.

2026-06-04 genetic and genomic medicine 10.64898/2026.06.03.26354819 medRxiv
Top 2%
0.4%
Show abstract

Background: Polygenic risk scores (PRS) for breast cancer are increasingly used for risk stratification to inform screening and prevention. However, for PRSs to be equitable and clinically useful, they need to perform well across diverse populations. While PRS performance is known to be ancestry-dependent, it is not well understood how environmental context, such as that of socioeconomic status (SES), affects PRS transferability. Here, we assess whether SES, measured via self-reported household income, modifies breast cancer PRS performance and, if so, whether socioeconomic context contributes predictive information beyond genetic risk alone. Methods: We used the US-based All of Us biobank to evaluate how SES impacts breast cancer PRS performance. First, we quantified changes in breast cancer PRS performance by modeling a commonly-cited polygenic score for breast cancer previously described by Mavaddat et al. with SES. We then reestimated the genetic effect sizes of the 3,820 variants from Mavaddat et al. in All of Us with and without income as a covariate. Because social determinants of health affect breast cancer detection and outcomes, we stratified analyses by socially defined populations on the basis of self-identified race and ethnicity. We further stratified individuals whose self-identified race is White (''White'') into three SES groups (high, middle, low) based on self-reported income and re-estimated genetic effect sizes to create SES-specific PRSs. We then applied these PRSs to White participants, the largest group in the study, and to Black or African American (''Black'') and Hispanic or Latino (''Hispanic'') participants, groups underrepresented in breast cancer research. Model discrimination between cases and controls was measured by area under the curve (AUC). Results: We analyzed 163,715 women from the All of Us biobank, which included 8,833 breast cancer cases (6,619 White, 1,178 Black, and 1,036 Hispanic), with relative income available for a subset of these cases (5,525 White, 848 Black, and 566 Hispanic). The ancestry-dependent performance of the breast cancer PRS described in Mavaddat et al. was replicated in All of Us. In Black individuals, this PRS (AUC and 95% CI: 0.576 [0.571, 0.582]) produced a similar increase in AUC as relative income (AUC: 0.573 [0.568, 0.577]) when added to an age-only model. Incorporating income with PRS, age, and genetic PCs 1-3 improved AUC by 0.007 in White Americans and 0.018 in Black Americans (both p < 10-11), while attenuating the contribution of PRS in the full model. PRS performance also varied among SES categories. Notably, PRSs with variant effect sizes that were recalibrated in low-SES White participants performed best in low-SES White participants (AUC: 0.605 [0.583, 0.628]) and Black Americans (AUC: 0.588 [0.586, 0.591]), both better than performance in high-SES White Americans (AUC: 0.579 [0.577, 0.580]) and middle-SES White Americans (AUC: 0.578 [0.569, 0.586]). Conclusion: Socioeconomic context, measured by income, significantly impacts the transferability of a PRS for breast cancer within and among groups defined by self-identified race and ethnicity. Accounting for SES improves PRS performance, most notably in Black Americans and low-SES White individuals.

15
Prediction of immunotherapy response using live tumor fragments from routine clinical biopsies

Braun, D.; Dana, N.; Hernan, H. R.; Sahni, S.; Scribano, C.; Johnson, C.; Vedder, L.; von Euw, E.; Zweng, J.; Wargowski, E.; Sunil, A.; Sharma, D.; Routh, J.; Rexroad, K.; McDonnell, P.; Jergens, V.; Costa, C.; Zuniga, R.; Toia, G. V.; Patel, P. M.; Martin, R. C. G.; Majeed, U.; Mukhopadhyay, D.; Lou, Y.; Kokabi, N.; Jakub, J. W.; Hays, D.; Godwin, A. K.; Giffi, V.; Gelbard, A.; Friedl, A.; Duimstra, E. K.; Dronca, R. S.; Chen, R.; Chalfin, H.; Broome, B.; Babiker, H. M.; Chandra, T.; Caenepeel, S.; Hrycyniak, L. C. F.; Sood, C.; Ramos, H.; Patel, P.; Advani, P.; Gierman, H. J.; Taube, J.

2026-06-10 oncology 10.64898/2026.06.05.26354635 medRxiv
Top 2%
0.4%
Show abstract

Functional ex vivo assays using live tumor tissues have demonstrated strong predictive accuracy for response to immune checkpoint inhibitors (ICIs) but are not scalable, requiring manual processing of large resections collected at academic centers. Here, an ex vivo live tumor fragment (LTF) platform was developed using standard-of-care biopsies from 228 patients with suspected malignancy collected across prospective, multicenter observational trials and biobanks. Hierarchical clustering of ICI-mediated changes in cytokine production identified two groups: responders and nonresponders. A binary classifier (elive index) using 8 cytokines achieved an AUC of 0.99 for cluster prediction. elive index correctly predicted clinical benefit in 93% (26/28) of patients (P = 3.2x10-5) and accurately identified 83% (10/12) of objective responders. Critically, elive responders were identified among biomarker-negative patients, highlighting the platform as a scalable approach that complements existing companion diagnostics and expands the population of patients identified to benefit from ICI therapy.

16
Study Design Indexing in Transition: A Focused Comparison of manual NLM Indexing vs. Transformer-based Automated Models

Das, P.; Schneider, J.; Mayo-Wilson, E.; Kilicoglu, H.; Menke, J. D.; Nam, D.; Ninan, K.; Oberste, J.-P.; Troy, A. M.; Ying, X.; Holt, A. W.; Smalheiser, N. R.

2026-06-04 health informatics 10.64898/2026.06.03.26354854 medRxiv
Top 2%
0.3%
Show abstract

Objectives: Study design indexing of biomedical publications is crucial for evidence retrieval and synthesis. We sought to evaluate the accuracy and suitability of a transformer-based model (TM) for indexing clinical study designs, in comparison to National Library of Medicine (NLM) indexing. However, this is challenging for at least three reasons: First, to date, all automated systems have been trained and evaluated on manual NLM indexing assignments, itself subject to errors. Second, TM's probabilistic predictive scores take into account uncertainty, and can be converted to TRUE/FALSE assignments in different ways depending on the needs of users, while NLM labels are categorical. Third, our goal (to tag articles only that exhibit a given design) differs from NLM which tags articles that both discuss as well as exhibit that design. Materials and Methods: Therefore, we carried out a limited evaluation of the TM model that focuses only on the articles that received the most confident predictions, that is, the highest scores that are almost certainly TRUE and the lowest scores that are almost certainly FALSE, but which disagreed with NLM assignments. This was performed both for articles published in 2016 (when NLM decisions were manual) and in 2025 (when NLM decisions were automated). To establish ground truth, dual annotators indexed the articles independently, following written definitions, for four prominent study designs--cohort, case-control, cross-sectional, and case report. Results: For three designs (case-control, case report, cross-sectional), the articles having the top 100 predictive TM scores (when NLM failed to assign that design) were judged to exhibit that design in the great majority (86-100%) of cases. Conversely, the articles having the lowest 100 predictive TM scores (when NLM did assign the study design) exhibited the design only in relatively few (0-21%) of cases. The most confident predictions of the TM model were highly accurate and not redundant with automated NLM indexing; the exception was cohort studies articles, in which both TM and NLM labels showed high error rates of both omission and commission. Discussion and Conclusion: TM may have value for identifying articles exhibiting study designs, which is especially important for clinical decision-making as well as systematic reviews and other evidence syntheses. NLM indexing of cohort studies cannot be regarded as a reliable gold standard for training or evaluation of automated systems, warranting efforts to create a new manually annotated corpus.

17
Dementia and Frailty Impact Postoperative Care Trajectories and Burden among Older Adults Undergoing Radical Cystectomy for Bladder Cancer

Ernandez, J.; Xiang, L.; Adler, R.; Hsu, J.; Shah, S. K.; Kim, D.; Gershman, B.; Mossanen, M.; Weissman, J. S.

2026-06-06 urology 10.64898/2026.06.04.26354768 medRxiv
Top 2%
0.3%
Show abstract

OBJECTIVE: Bladder cancer (BC) is predominantly a disease of older, comorbid adults, and radical cystectomy (RC), which is the gold standard treatment, carries considerable morbidity. We sought to determine the impact of baseline dementia and frailty on the care trajectory beyond the immediate postoperative period. We hypothesized that frail patients and those with dementia undergoing RC for BC will have poorer care trajectories. METHODS AND MATERIALS: We identified Medicare beneficiaries [&ge;] 66 years old who underwent RC for BC in 2017 with 12 months of pre- and post-RC enrollment. Frailty and dementia were characterized using validated, claims-based measures. Associations between baseline frailty and dementia with postoperative care trajectory outcomes were determined using Fine-Gray competing risk models. RESULTS: We identified 3,600 beneficiaries of whom 11.6% were frail and 3.4% met criteria for dementia. Patients with dementia were more likely to be frail, comorbid, and not receive standard-of-care neoadjuvant chemotherapy. Frailty was independently associated with [&ge;] 2 transitions in care level after index discharge from RC and skilled nursing facility (SNF) admissions within 1 year of RC, exposure to intensive post-RC interventions, including dialysis and feeding tube placement, and poorer survival. Dementia remained associated with SNF admissions regardless of frailty level. CONCLUSIONS: Among a contemporary cohort of older adults undergoing RC for BC, preoperative dementia and frailty were independently associated with poorer care trajectory beyond the immediate postoperative period after RC. Our work highlights a role for preoperative geriatric assessment in identifying and optimizing patients at greatest risk.

18
Alcohol Consumption Patterns and Sociodemographic Correlates Among US Adults with Cardiovascular Disease: A Cross-Sectional Analysis of All of Us and NHANES

yang, q.; yu, j.; zhao, h.; zou, m.; sun, y.

2026-06-09 public and global health 10.64898/2026.06.06.26355052 medRxiv
Top 2%
0.3%
Show abstract

This cross-sectional study aimed to examine the prevalence of alcohol use and its sociodemographic correlates among adults with cardiovascular disease (CVD). We analyzed data from two large US cohorts: the All of Us Research Program (2017-2023) and the National Health and Nutrition Examination Survey (NHANES, 1999-2016). Both CVD diagnosis and past-year alcohol consumption were self-reported. Risky drinking was defined as exceeding moderate drinking or binge drinking (All of Us), or moderate/heavy drinking (NHANES). Multivariable logistic regression was used to exam associations with sociodemographic and lifestyle factors. Among 32,788 current drinkers with CVD in the All of Us cohort, 15% exceeded moderate drinking thresholds and 26% reported binge drinking. Older age, female sex, and higher socioeconomic status were inversely associated with risky drinking, while smoking was positively associated. In NHANES, moderate drinking rose from 47.3% to 57.2% and heavy drinking from 6.7% to 7.2%. Moderate/heavy drinking was positively associated with age <65 but inversely with age [&ge;]65. Higher education and income were linked to moderate drinking, while current smoking was strongly associated with heavy drinking. These results highlight the need to integrate holistic screening for alcohol use, tobacco use, and social context into routine cardiovascular care.

19
Next-Generation Skin Cancer Detection Using Efficient Fuzzy Fusion of Genomic and Imaging Data

Molla, A. R.; Maity, A.; Saha, S.; Bhattacharya, R.; Chakraborty, A.; Biswas, S.; Nath, S.

2026-06-08 health informatics 10.64898/2026.06.05.26355024 medRxiv
Top 2%
0.3%
Show abstract

Skin cancer requires early detection for improved survival rates. Most existing methods rely on deep learning based image classification, which is affected by visual similarity among lesions. Fewer studies use Gene Expression (GE) analysis, which captures molecular characteristics but lacks structural and visual details. To overcome limitations of individual modalities, this paper proposes a multimodal framework integrating dermoscopic images and GE profiles for skin cancer classification. EfficientNet and logistic regression are used for image based analysis and genomic skin lesion profiling, respectively, followed by fuzzy rule based decision systems to reduce uncertainty within individual modalities. Finally, fuzzy fusion combines predictions from both modalities using uncertainty based weighting of classifier outputs. The experimental findings show that both the image based and GE based classification models individually achieved accuracies of nearly 92%. However, the integration of prediction results through the proposed fuzzy fusion strategy further enhanced the classification performance, achieving an overall accuracy of 94.25%. The results obtained outperform contemporary methods, highlighting the effectiveness of combining complementary multimodal information compared with single modality approaches.

20
Magnitude of Undernutrition and Associated Factors among Pregnant Women Attending Public Health Facilities in Goba District, Bale Zone, Oromia, Ethiopia: A cross-sectional Study,

Ibrahim, S. M.; Lakew, M. S.; Amhare, A. F.; Hussein, D.; Kedir, H.; Abdulbesit, H.

2026-06-08 nutrition 10.64898/2026.06.05.26354999 medRxiv
Top 2%
0.3%
Show abstract

Abstract Objective: This study aimed to assess the magnitude of undernutrition and associated factors among pregnant women attending public health facilities in the Goba district, Bale zone, Oromia Region, Ethiopia, 2022. Design: Institution-based, cross-sectional study design was used. Setting: The study was conducted in selected public health facilities from May to June 2022. Participants: The study population consisted of pregnant women who lived for at least 6 months in the study area and who attended antenatal care follow-up at selected public health facilities during the study period. Pregnant women who lived for less than six months in the study area and those who were critically ill were excluded from the study. Results: 487 respondents participated in this study with a 100% response rate. More than half (50.7%) of pregnant mothers were undernourished. The significant factors associated with maternal undernutrition during pregnancy in this study were mothers with no formal education (AOR = 5.050; 95% CI: 1.470- 17.346), a history of illness during pregnancy (AOR = 2.089; 95% CI: 1.246-3.504), and eating frequency of meals less than or equal to three times per day (AOR = 3.292; 95% CI: 1.040- 10.42). Poor nutritional knowledge (AOR = 5.588; 95% CI: 2.921-10.689), poor household (HH) wealth status (AOR = 4.774; 95% CI: 2.216- 10.285), and mothers who had >= 4 pregnancies were included (AOR = 0.852; 95% CI: 342-0.989). Conclusion: The magnitude of Undernutrition among pregnant women was 50.7%. Significant associations with Undernutrition were found in mothers with no formal education, poor dietary knowledge, a meal frequency of three or fewer times per day, a history of illness during pregnancy, lower and medium household wealth status, and those who had experienced four or more pregnancies while attending antenatal care (ANC) services at public health facilities.